Menu Top
Complete Course of Mathematics
Topic 1: Numbers & Numerical Applications Topic 2: Algebra Topic 3: Quantitative Aptitude
Topic 4: Geometry Topic 5: Construction Topic 6: Coordinate Geometry
Topic 7: Mensuration Topic 8: Trigonometry Topic 9: Sets, Relations & Functions
Topic 10: Calculus Topic 11: Mathematical Reasoning Topic 12: Vectors & Three-Dimensional Geometry
Topic 13: Linear Programming Topic 14: Index Numbers & Time-Based Data Topic 15: Financial Mathematics
Topic 16: Statistics & Probability


Content On This Page
Correlation: Definition and Types (Positive, Negative, Zero) Scatter Diagram Methods of Measuring Correlation (Karl Pearson's Coefficient - Implicit)
Rank Correlation (Spearman's Rank Correlation Coefficient - Implicit)


Correlation




Correlation: Definition and Types (Positive, Negative, Zero)


Definition

Correlation is a statistical concept that measures the **strength** and **direction** of a **linear relationship** between two quantitative variables. When we examine two variables, say $X$ and $Y$, correlation tells us how consistently changes in one variable are associated with changes in the other, specifically in a straight-line pattern.

The strength of the correlation indicates how closely the relationship follows a perfect linear pattern. A strong correlation means the points lie very close to a straight line, while a weak correlation means they are scattered more broadly around a line.

Important Considerations:


Types of Correlation based on Direction

Based on the direction of the linear relationship, correlation can be classified into three main types:

  1. Positive Correlation:

    Positive correlation exists when two variables tend to move in the same direction. As one variable increases, the other variable also tends to increase, and vice versa.

    • If plotted on a scatter diagram, the points will generally cluster around a line that slopes upwards from left to right.
    • Example: As the number of hours studied increases, the test scores tend to increase. As temperature rises, ice cream sales tend to increase. Height and weight often show a positive correlation.
    Scatter plot showing a positive linear relationship
  2. Negative Correlation:

    Negative correlation exists when two variables tend to move in opposite directions. As one variable increases, the other variable tends to decrease, and vice versa.

    • If plotted on a scatter diagram, the points will generally cluster around a line that slopes downwards from left to right.
    • Example: As altitude increases, air pressure tends to decrease. As the price of a product increases, the quantity demanded by consumers tends to decrease. The number of hours spent watching TV and the number of hours spent exercising might show a negative correlation.
    Scatter plot showing a negative linear relationship
  3. Zero Correlation (or No Linear Correlation):

    Zero correlation (or negligible linear correlation) exists when there is no discernible linear relationship between the two variables. Changes in one variable are not consistently associated with either an increase or a decrease in the other variable in a linear pattern.

    • If plotted on a scatter diagram, the points will appear randomly scattered, forming no clear linear pattern (neither upwards nor downwards).
    • Example: The relationship between a person's shoe size and their IQ. The relationship between hair colour and marks in a statistics exam.
    • It is important to remember that zero linear correlation does not rule out the possibility of a strong non-linear relationship.
    Scatter plot showing no clear linear relationship

The **strength** of the correlation is measured by a correlation coefficient (like Karl Pearson's coefficient, discussed later). The coefficient ranges from -1 to +1. A value of +1 indicates a perfect positive linear correlation, -1 indicates a perfect negative linear correlation, and 0 indicates no linear correlation. Values closer to +1 or -1 indicate stronger linear relationships, while values closer to 0 indicate weaker linear relationships.



Scatter Diagram


Definition

A Scatter Diagram, also known as a scatter plot, is a basic and essential graphical tool used to visualize the relationship between two quantitative variables. It is constructed by plotting pairs of observations from two variables, say $X$ and $Y$, as points on a two-dimensional Cartesian coordinate system.

In a scatter diagram:

The pattern formed by the collection of plotted points provides a visual representation of the relationship between the two variables.


Construction

To construct a scatter diagram for a set of paired observations $(x_1, y_1), (x_2, y_2), \dots, (x_n, y_n)$:

  1. Draw and Label Axes:

    Draw two perpendicular axes, the horizontal axis (x-axis) and the vertical axis (y-axis), intersecting at the origin (0). Label the x-axis with the name of the first variable and the y-axis with the name of the second variable. Include units if applicable.

  2. Determine Scales:

    Choose appropriate scales for both axes based on the range of values for each variable in your dataset. The scales should be chosen such that all data points fit comfortably on the graph and the scatter is clearly visible. The axes do not necessarily have to start at zero, especially if the data values are far from zero; use a break in the axis if needed.

  3. Plot Points:

    For each pair of observations $(x_i, y_i)$, locate the corresponding value on the x-axis ($x_i$) and the y-axis ($y_i$). Plot a single point at the intersection of the horizontal line from $x_i$ and the vertical line from $y_i$. Repeat this for all $n$ pairs of observations.

  4. Add Title:

    Give the scatter diagram a clear, concise title that describes the relationship being visualized (e.g., "Relationship between Maths Score and Physics Score").


Interpretation

Interpreting the pattern of points in a scatter diagram is a crucial first step in analyzing the relationship between two variables. Look for the following characteristics:

A scatter diagram provides a quick visual summary and guides the choice of appropriate quantitative correlation methods.


Example

Example 1. The scores obtained by 6 students in Maths (x) and Physics (y) in a test are given as pairs of (Maths Score, Physics Score): (80, 75), (60, 65), (90, 85), (50, 55), (70, 70), (95, 90). Draw a scatter diagram for this data and interpret it.

Answer:

Given: Paired scores of 6 students in Maths and Physics.

To Draw: A scatter diagram.

To Interpret: The scatter diagram.

Solution:

  1. Choose Axes: Let the Maths Score be on the x-axis and the Physics Score be on the y-axis.
  2. Determine Scales: The Maths scores range from 50 to 95, and Physics scores range from 55 to 90. We can choose a scale for both axes starting from, say, 40 and going up to 100, with increments of 10.
  3. Plot Points: Plot the 6 given pairs of points on the graph paper: (80, 75), (60, 65), (90, 85), (50, 55), (70, 70), (95, 90).
  4. Label Axes and Title: Label the x-axis "Maths Score" and the y-axis "Physics Score". Add the title "Scatter Diagram of Maths Score vs. Physics Score".
Scatter diagram of Maths scores vs Physics scores for 6 students

Title: Scatter Diagram of Maths Score vs. Physics Score

Interpretation:

Observing the pattern of the points in the scatter diagram, we can see that:

  • The points appear to cluster roughly along a straight line.
  • This line slopes upwards from the lower left to the upper right corner of the graph.

This pattern suggests a **positive linear relationship** between Maths scores and Physics scores for this group of students. Students who scored higher in Maths generally tended to score higher in Physics as well, and students who scored lower in Maths tended to score lower in Physics.

The points are reasonably close to forming a straight line, indicating a moderately strong positive linear correlation.




Methods of Measuring Correlation (Karl Pearson's Coefficient - Implicit)


Quantifying Linear Relationship

While a scatter diagram provides a visual assessment of the relationship between two quantitative variables, it does not give a precise numerical measure of the strength and direction of the linear association. To quantify this linear relationship, we use a statistical measure called a correlation coefficient.

The most widely used method for measuring the strength and direction of a **linear relationship** between two quantitative variables is **Karl Pearson's Product-Moment Correlation Coefficient**.

Karl Pearson's Product-Moment Correlation Coefficient ($r$)

Pearson's $r$ is suitable for quantitative data measured on interval or ratio scales when the relationship is believed to be approximately linear and the data does not contain extreme outliers that could heavily influence the mean and standard deviation.



Rank Correlation (Spearman's Rank Correlation Coefficient - Implicit)


Measuring Monotonic Relationship

While Pearson's correlation coefficient measures the strength of a **linear** relationship, sometimes we are interested in measuring the strength of a **monotonic** relationship. A monotonic relationship is one where the variables tend to move in the same direction (always increasing or always decreasing together), but not necessarily at a constant rate (i.e., not necessarily in a straight line).

Spearman's Rank Correlation Coefficient is a non-parametric measure that assesses the strength and direction of the monotonic association between two variables. It is particularly useful in the following situations:

Spearman's correlation is essentially Pearson's correlation calculated on the ranks of the data values rather than the raw values themselves.

Spearman's Rank Correlation Coefficient ($r_s$)

Spearman's rank correlation is a valuable non-parametric alternative when assumptions for Pearson's $r$ (linearity, normally distributed variables) are not met, or when dealing with ordinal data or potential outliers.